.\" $Id: bw_mem.8 1.4 00/10/16 17:13:36+02:00 staelin@hpli8.hpli.hpl.hp.com $
.TH BW_MEM 8 "$Date: 00/10/16 17:13:36+02:00 $" "(c)1994-2000 Larry McVoy and Carl Staelin" "LMBENCH"
.SH NAME
bw_mem \- time memory bandwidth
.SH SYNOPSIS
.B bw_mem_cp
[
.I "-P <parallelism>"
]
[
.I "-W <warmups>"
]
[
.I "-N <repetitions>"
]
.I size
.I rd|wr|rdwr|cp|fwr|frd|bzero|bcopy
.I [align]
.SH DESCRIPTION
.B bw_mem
allocates twice the specified amount of memory, zeros it, and then times
the copying of the first half to the second half.  Results are reported
in megabytes moved per second.
.LP
The size
specification may end with ``k'' or ``m'' to mean
kilobytes (* 1024) or megabytes (* 1024 * 1024).
.SH OUTPUT
Output format is \f(CB"%0.2f %.2f\\n", megabytes, megabytes_per_second\fP, i.e.,
.sp
.ft CB
8.00 25.33
.ft
.LP
There are nine different memory benchmarks in
.BR bw_mem .
They each measure slightly different methods for reading, writing or
copying data.
.TP
.B "rd"
measures the time to read data into the processor.  It computes the
sum of an array of integer values.  It accesses every fourth word.
.TP
.B "wr"
measures the time to write data to memory.  It assigns a constant
value to each memory of an array of integer values.
It accesses every fourth word.
.TP
.B "rdwr"
measures the time to read data into memory and then write data to
the same memory location.  For each element in an array it adds
the current value to a running sum before assigning a new (constant)
value to the element.
It accesses every fourth word.
.TP
.B "cp"
measures the time to copy data from one location to another.  It
does an array copy: dest[i] = source[i].
It accesses every fourth word.
.TP
.B "frd"
measures the time to read data into the processor.  It computes the
sum of an array of integer values.
.TP
.B "fwr"
measures the time to write data to memory.  It assigns a constant
value to each memory of an array of integer values.
.TP
.B "fcp"
measures the time to copy data from one location to another.  It
does an array copy: dest[i] = source[i].
.TP
.B "bzero"
measures how fast the system can
.I bzero
memory.
.TP
.B "bcopy"
measures how fast the system can
.I bcopy
data.
.SH MEMORY UTILIZATION
This benchmark can move up to three times the requested memory.  
Bcopy will use 2-3 times as much memory bandwidth:
there is one read from the source and a write to the destionation.  The
write usually results in a cache line read and then a write back of
the cache line at some later point.  Memory utilization might be reduced
by 1/3 if the processor architecture implemented ``load cache line''
and ``store cache line'' instructions (as well as ``getcachelinesize'').
.SH "SEE ALSO"
lmbench(8).
.SH "AUTHOR"
Carl Staelin and Larry McVoy
.PP
Comments, suggestions, and bug reports are always welcome.
